Maximum entropy methods for biological sequence modeling
نویسندگان
چکیده
Many of the same modeling methods used in natural languages, speci cally Markov models and HMM's, have also been applied to biological sequence analysis. In recent years, natural language models have been improved upon by using maximum entropy methods which allow information based upon the entire history of a sequence to be considered. This is in contrast to the Markov models, whose predictions generally are based on some xed number of previous emissions, that have been the standard for most biological sequence models. To test the utility of Maximum Entropy modeling for biological sequence analysis, we used these methods to model amino acid sequences. Our results show that there is signi cant long-distance information in amino acid sequences and suggests that maximum entropy techniques may be bene cial for a range of biological sequence analysis problems.
منابع مشابه
Statistical Models for the Analysis of Heterogeneous Biological Data Sets
STATISTICAL MODELS FOR THE ANALYSIS OF HETEROGENEOUS BIOLOGICAL DATA SETS Eugen Buehler Lyle Ungar The focus of this thesis is on developing methods of integrating heterogeneous biological feature sets into structured statistical models, so as to improve model predictions and further understanding of the complex systems that they emulate. Combining data from different sources is an important ta...
متن کاملModeling and Performance of Waste Tires as Media in Fixed Bed Sequence Batch Reactor
Introduction: The modeling aims to simulate or optimize a process in physical, chemical or biological environments and the derived model will provide a considerable assistance to generate data and predict unknown condition, in case of sufficient suitability. Unsuitable disposal and elimination of waste tires have polluted the environment and human life areas, it also have caused removal of a hu...
متن کاملA Note on the Bivariate Maximum Entropy Modeling
Let X=(X1 ,X2 ) be a continuous random vector. Under the assumption that the marginal distributions of X1 and X2 are given, we develop models for vector X when there is partial information about the dependence structure between X1 and X2. The models which are obtained based on well-known Principle of Maximum Entropy are called the maximum entropy (ME) mo...
متن کاملModeling of the Maximum Entropy Problem as an Optimal Control Problem and its Application to Pdf Estimation of Electricity Price
In this paper, the continuous optimal control theory is used to model and solve the maximum entropy problem for a continuous random variable. The maximum entropy principle provides a method to obtain least-biased probability density function (Pdf) estimation. In this paper, to find a closed form solution for the maximum entropy problem with any number of moment constraints, the entropy is consi...
متن کاملREBMEC: Repeat Based Maximum Entropy Classifier for Biological Sequences
An important problem in biological data analysis is to predict the family of a newly discovered sequence like a protein or DNA sequence, using the collection of available sequences. In this paper we tackle this problem and present REBMEC, a Repeat Based Maximum Entropy Classifier of biological sequences. Maximum entropy models are known to be theoretically robust and yield high accuracy, but ar...
متن کامل